Photo-Realistic Mouth Animation Based on an Asynchronous Articulatory DBN Model for Continuous Speech
نویسندگان
چکیده
This paper proposes a continuous speech driven photo realistic visual speech synthesis approach based on an articulatory dynamic Bayesian network model (AF_AVDBN) with constrained asynchrony. In the training of the AF_AVDBN model, the perceptual linear prediction (PLP) features and YUV features are extracted as acoustic and visual features respectively. Given an input speech and the trained AF_AVDBN parameters, an EM-based algorithm is deduced to learn the optimal YUV features, which are then used, together with the compensated high frequency components, to synthesize the mouth animation corresponding to the input speech. In the experiments, mouth animations are synthesized for 80 connected digit speech sentences. Both qualitative and quantitative evaluation results show that the proposed method is capable of synthesizing more natural, clear and accurate mouth animations than those from the state asynchronous DBN model (S_A_DBN).
منابع مشابه
Photo-realistic visual speech synthesis based on AAM features and an articulatory DBN model with constrained asynchrony
This paper presents a photo realistic visual speech synthesis method based on an audio visual articulatory dynamic Bayesian network model (AF_AVDBN) in which the maximum asynchronies between the articulatory features, such as lips, tongue and glottis/velum, can be controlled. Perceptual linear prediction (PLP) features from the audio speech and active appearance model (AAM) features from mouth ...
متن کاملStartegies and results for the evaluation of the naturalness of the LIPPS facial animation system
The paper describes strategy and results for an evaluation of the naturalness of a facial animation system with the help of hearing-impaired persons. It shows perspectives for improvement of the facial animation model, independent on the animation model itself. The fundamental thesis of the evaluation is that the comparison of presented and perceived visual information has to be performed on ba...
متن کاملA coupled HMM approach to video-realistic speech animation
We propose a coupled hidden Markov model (CHMM) approach to video-realistic speech animation, which realizes realistic facial animations driven by speaker independent continuous speech. Different from hidden Markov model (HMM)-based animation approaches that use a singlestate chain, we use CHMMs to explicitly model the subtle characteristics of audio–visual speech, e.g., the asynchrony, tempora...
متن کاملArtimate: an articulatory animation framework for audiovisual speech synthesis
We present a modular framework for articulatory animation synthesis using speech motion capture data obtained with electromagnetic articulography (EMA). Adapting a skeletal animation approach, the articulatory motion data is applied to a threedimensional (3D) model of the vocal tract, creating a portable resource that can be integrated in an audiovisual (AV) speech synthesis platform to provide...
متن کاملVisemeaware Realistic 3d Face Modeling from Range Images
In this paper, we propose an example based realistic face modeling method with viseme control. The viseme describes the particular facial and oral positions and movements that occur alongside the voicing of phonemes. In facial animation such as speech animation and talking head, etc, a face model with open mouth is often used and the model is animated along with the speech sound by synchronizin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011